This report examines the UKgas dataset, which contains
quarterly gas consumption in the United Kingdom. The goal is to
understand the main time series patterns in the data, build a reasonable
seasonal ARIMA model, compare a few plausible alternatives, and generate
short-term forecasts.
The analysis follows a standard Box–Jenkins workflow: exploratory plotting, variance stabilization, differencing to achieve stationarity, ACF/PACF-based model identification, candidate model comparison, residual diagnostics, and forecasting.
UKgas is included in base R’s datasets
package. It is a quarterly series, so any seasonal model should account
for a repeating pattern every 4 periods.
plot(UKgas,
main = "UK Gas Consumption Over Time",
ylab = "Gas Consumption",
xlab = "Year")
The raw series shows three features immediately. First, there is a strong long-run upward trend. Second, the within-year pattern is pronounced, which points to strong quarterly seasonality. Third, the seasonal swings become larger over time, suggesting that the variance increases with the level of the series.
Together, these patterns indicate that the original series is not stationary and that a transformation is likely needed before modeling.
log_gas <- log(UKgas)
plot(log_gas,
main = "Log-Transformed UK Gas Consumption",
ylab = "log(UKgas)",
xlab = "Year")
A logarithmic transformation is a natural next step because it stabilizes the changing amplitude in the seasonal pattern. After transformation, the seasonal variation becomes more even across time, which makes the series more suitable for ARIMA-type modeling. The trend and seasonality are still present, so differencing is still required.
diff_log_gas <- diff(log_gas)
plot(diff_log_gas,
main = "First Difference of log(UKgas)",
ylab = "Differenced log(UKgas)",
xlab = "Year")
diff_seasonal_log <- diff(log_gas, lag = 4)
plot(diff_seasonal_log,
main = "Seasonal Difference of log(UKgas) at Lag 4",
ylab = "Seasonally Differenced log(UKgas)",
xlab = "Year")
diff_both <- diff(diff(log_gas, lag = 4))
plot(diff_both,
main = "Combined Regular and Seasonal Differencing",
ylab = "Differenced log(UKgas)",
xlab = "Year")
The first difference reduces the long-run trend, but visible seasonality remains. Seasonal differencing at lag 4 removes most of the quarterly structure. Applying both transformations produces a series that fluctuates around a roughly constant mean with much more stable variability.
The transformation used for modeling is:
\[ (1-B)(1-B^4)\log(UKgas) \]
where \(B\) is the backshift operator. This transformed series appears plausibly stationary and is the series used for model identification.
par(mfrow = c(1, 2))
acf(diff_both, main = "ACF of Differenced log(UKgas)")
pacf(diff_both, main = "PACF of Differenced log(UKgas)")
par(mfrow = c(1, 1))
The ACF shows a strong negative spike at lag 1 and a noticeable seasonal feature at lag 4. The PACF does not show a clean cutoff but instead tails off across the early lags. That pattern is more consistent with moving-average behavior than with a strong autoregressive cutoff.
Based on these plots, it is reasonable to start with models that include a non-seasonal MA(1) term and either a seasonal MA(1) or seasonal AR(1) term.
auto_model <- auto.arima(
log_gas,
seasonal = TRUE,
stepwise = FALSE,
approximation = FALSE
)
auto_model
## Series: log_gas
## ARIMA(3,0,1)(0,1,0)[4] with drift
##
## Coefficients:
## ar1 ar2 ar3 ma1 drift
## -0.9753 -0.1429 0.1421 0.8809 0.0164
## s.e. 0.1260 0.1380 0.1056 0.0898 0.0023
##
## sigma^2 = 0.01029: log likelihood = 92.66
## AIC=-173.32 AICc=-172.45 BIC=-157.45
auto.arima() serves as a useful benchmark, but it is not
the final word. It chooses a model by optimizing an information
criterion over many possibilities, while the manual candidate models
below are guided by the differenced series, the ACF/PACF structure, and
model interpretability.
model1 <- capture.output(
fit1 <- sarima(log_gas, p = 0, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 4)
)
print(model1[grep("Coefficients", model1):length(model1)])
## [1] "Coefficients: "
## [2] " Estimate SE t.value p.value"
## [3] "ma1 -0.9192 0.0455 -20.1994 0.0000"
## [4] "sma1 -0.2353 0.1028 -2.2891 0.0242"
## [5] ""
## [6] "sigma^2 estimated as 0.01097285 on 101 degrees of freedom "
## [7] " "
## [8] "AIC = -1.592327 AICc = -1.591161 BIC = -1.515587 "
## [9] " "
The fitted model is:
\[ (1-B)(1-B^4)\log(UKgas_t) = (1 - 0.9192B)(1 - 0.2353B^4)\varepsilon_t \]
This is the most classical specification for this series. It aligns closely with the ACF/PACF evidence and keeps the model fairly compact. Both fitted parameters are statistically significant, and the diagnostic plots suggest that most of the serial dependence has been removed.
model2 <- capture.output(
fit2 <- sarima(log_gas, p = 1, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 4)
)
print(model2[grep("Coefficients", model2):length(model2)])
## [1] "Coefficients: "
## [2] " Estimate SE t.value p.value"
## [3] "ar1 -0.2032 0.1052 -1.9314 0.0563"
## [4] "ma1 -0.8868 0.0524 -16.9267 0.0000"
## [5] "sma1 -0.2027 0.1042 -1.9443 0.0547"
## [6] ""
## [7] "sigma^2 estimated as 0.01061266 on 100 degrees of freedom "
## [8] " "
## [9] "AIC = -1.607326 AICc = -1.604972 BIC = -1.505006 "
## [10] " "
The fitted model is:
\[ (1 + 0.2032B)(1-B)(1-B^4)\log(UKgas_t) = (1 - 0.8868B)(1 - 0.2027B^4)\varepsilon_t \]
This model adds a non-seasonal AR(1) term. It achieves the lowest AIC among the three candidates, so it deserves serious consideration. At the same time, the AR(1) and seasonal MA(1) terms are only marginally significant, which weakens the case for the extra complexity.
model3 <- capture.output(
fit3 <- sarima(log_gas, p = 0, d = 1, q = 1, P = 1, D = 1, Q = 0, S = 4)
)
print(model3[grep("Coefficients", model3):length(model3)])
## [1] "Coefficients: "
## [2] " Estimate SE t.value p.value"
## [3] "ma1 -0.9260 0.0446 -20.7497 0.0000"
## [4] "sar1 -0.2278 0.0991 -2.2980 0.0236"
## [5] ""
## [6] "sigma^2 estimated as 0.01097152 on 101 degrees of freedom "
## [7] " "
## [8] "AIC = -1.592337 AICc = -1.591172 BIC = -1.515597 "
## [9] " "
The fitted model is:
\[ (1-B)(1-B^4)\log(UKgas_t) = (1 - 0.9260B)(1 - 0.2278B^4)\varepsilon_t \]
This model replaces the seasonal MA term with a seasonal AR term. Its overall fit is very close to Model 1, and both parameters are significant. It is a reasonable alternative, though it does not clearly outperform the simpler seasonal MA specification.
Across the candidate models, the standardized residuals fluctuate around zero without an obvious trend or structure. The residual ACF plots show that most autocorrelations fall within the confidence bounds, and the Ljung–Box p-values are generally acceptable at moderate lags. The Q–Q plots also suggest approximate normality, with only minor deviations in the tails.
That combination of results indicates that all three models are broadly workable, but the choice should still depend on fit, parsimony, and coefficient stability.
Model 2 has the lowest AIC, which means it provides the best fit by that metric alone. However, the gain over Model 1 is modest, and two of its parameters are only marginally significant. Model 3 performs almost identically to Model 1 in terms of AIC but is not clearly simpler or more interpretable.
For a portfolio report, the main point is not to claim a single universally correct model, but to show a defensible selection process. Here, the simplest strong choice is Model 1.
The final model selected for forecasting is:
\[ \text{SARIMA}(0,1,1)(0,1,1)_{4} \]
This model is preferred because it is parsimonious, statistically well-supported, and diagnostically adequate. It captures both short-term dependence and quarterly seasonal structure without adding borderline terms.
forecast_result <- sarima.for(
log_gas,
n.ahead = 12,
p = 0, d = 1, q = 1,
P = 0, D = 1, Q = 1,
S = 4,
ylab = "Log UK Gas Consumption",
main = "12-Quarter Forecast from SARIMA(0,1,1)(0,1,1)[4]"
)
The selected model is used to forecast the next 12 quarters, or roughly three years ahead. The forecast plot includes the point forecasts together with 95% prediction intervals.
The forecast preserves the key structure seen in the historical data: an overall upward level together with a repeating quarterly seasonal pattern. The prediction intervals widen as the forecast horizon increases, which is expected because uncertainty accumulates the further ahead the model projects.
Because the model was fitted on the log scale, the forecast should be interpreted as describing multiplicative growth and seasonality in the original series. For short-horizon forecasting, the model appears to provide a reasonable and interpretable summary of the series dynamics.
This analysis shows how a seasonal time series can be modeled using a practical Box–Jenkins workflow. Starting from the raw UK gas consumption data, the report moved through transformation, differencing, model identification, diagnostic checking, and forecasting.
The final result is a compact SARIMA model that captures both the short-run and seasonal structure of the series and produces credible short-term forecasts. From a portfolio perspective, the project demonstrates a full time series workflow rather than just a final answer: diagnosing non-stationarity, comparing plausible alternatives, and selecting a model for defensible reasons.